Nonparametric Depth-Based Multivariate Outlier Identifiers, and Masking Robustness Properties
نویسندگان
چکیده
In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on multivariate depth functions, which can generate contours following the shape of the data set. Also, we study masking robustness, that is, robustness against misidentification of outliers as nonoutliers. In particular, we define a masking breakdown point (MBP), adapting to our setting certain ideas of Davies and Gather (1993) and Becker and Gather (1999) based on the Mahalanobis distance outlyingness. We then compare four affine invariant outlier detection procedures, based on Mahalanobis distance, halfspace or Tukey depth, projection depth, and “Mahalanobis spatial” depth. For the goal of threshold type outlier detection, it is found that the Mahalanobis distance and projection procedures are distinctly superior in performance, each with very high MBP, while the halfspace approach is quite inferior. When a moderate MBP suffices, the Mahalanobis spatial procedure is competitive in view of its contours not constrained to be elliptical and its computational burden relatively mild. A small sampling experiment yields findings completely in accord with the theoretical comparisons. While these four depth procedures are relatively comparable for the purpose of robust affine equivariant location estimation, the halfspace depth is not competitive with the others for the quite different goal of robust setting of an outlyingness threshold. AMS 2000 Subject Classification: Primary 62G10 Secondary 62H99.
منابع مشابه
Nonparametric Depth-Based Multivariate Outlier Identifiers, and Robustness Properties
In extending univariate outlier detection methods to higher dimension, various special issues arise, such as limitations of visualization methods, inadequacy of marginal methods, lack of a natural order, limited scope of parametric modeling, and restriction to ellipsoidal contours when using Mahalanobis distance methods. Here we pass beyond these limitations via an approach based on depth funct...
متن کاملA numerical study of multiple imputation methods using nonparametric multivariate outlier identifiers and depth-based performance criteria with clinical laboratory data
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation methods tend to impute non-extreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depthbased multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of multiple im...
متن کاملGeneral Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers
With greatly advanced computational resources, the scope of statistical data analysis and modeling has widened to accommodate pressing new arenas of application. In all such data settings, an important and challenging task is the identification of outliers. Especially, an outlier identification procedure must be robust against the possibilities of masking (an outlier is undetected as such) and ...
متن کاملOn Masking and Swamping Robustness of Leading Outlier Identifiers for Univariate Data
In the wide-ranging scope of modern statistical data analysis, a key task is identification of outliers. In using an outlier identification procedure, one needs to know its robustness against masking (an “outlier” is undetected) and swamping (a “nonoutlier” is classified as an “outlier”), possibilities which can come about due to the presence of outliers. Study of these issues together is neces...
متن کاملSurvey on (Some) Nonparametric and Robust Multivariate Methods
Rather than attempt an encyclopedic survey of nonparametric and robust multivariate methods, we limit to a manageable scope by focusing on just two leading and pervasive themes, descriptive statistics and outlier identification. We set the stage with some perspectives, and we conclude with a look at some open issues and directions. A variety of questions are raised. Is nonparametric inference t...
متن کامل